BY Yifan Ding
This dataset is collected from Guangzhou Women and Children’s Medical Center, Guangzhou, and consists of X-ray images of children age 1-5, some of whom had bacterial or viral pneumonia. The images are x-rays of the chest and have been labeled by two different doctors---with a third the tiebreaker if the two disagreed. The original images are in different size, so we resized each image to 160x240 pixels jpeg and converted them into grayscale images. The following graph is from the last page from the Cell Paper: "Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning" shows the distinction between a normal lung, a lung with bacterial pneumonia, and a lung with viral pneumonia. https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
from IPython.display import Image
Image(filename = "/Users/xuechenli/Downloads/lessons/7324/lab2/Lung_Classification.png" )
The purpose of this dataset was to develop an artificial neural network that will be able to distinguish between children with pneumonia in order to assist doctors in making the right decision. There are three different classifications: normal, pneumonia-bacterial, and pneumonia-viral. The cell paper, "Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning"'s focus is to use transfer learning which is a technique to train a neural network with a relatively small number of images, (Kermany 1122). For us, our purpose will be to classify the aforementioned three types for a preliminary screening. As such, the kids will still receive medial care and a second opinion from a doctor so the stakes are not quite as high for our algorithm.
There are three different classifications for our image data. The prediction task for this algorithm is to distinguish between children with a normal lung, a lung with bacterial pneumonia, and a lung with viral pneumonia. This algorithm would be used by hospitals who have an x-ray machine and who serve children between the ages of 1-5. Though this data was screened by Chinese children, it can probably be used for children of other nationalities as well.
According to the Cell Paper, data collected by the World Health organization shows that pneumonia kills approximately 2 million children under 5 years old every year (though most of these deaths occur in Southeast asia or Africa). (Kermany 1127) Since chest x-rays are common and can be used to identify between kids with pneumonia and kids without pneumonia, x-rays were chosen as the method of choice. If we could develop an accurate and quick classifier, it might be able to be used wherever x-rays are used. If developed, such an algorithm could be used by nurses and would not require a doctor to analyze the chest x-ray images. This technique would just save Doctors' time and, potentially, the children who are suffering from pneumonia. Th algorithm would successfully screen kids with pneumonia and direct them to the needed medical care: antibiotics if the child had bacterial pneumonia, supportive care if the child had viral pneumonia, and discharge if the child does not have pneumonia.
In order for our algorithm to be useful, it would have to be better than the neural net algorithm that has already been created using the same data in some way. This could mean that our algorithm achieves a greater accuracy than their 93 % and that the area under the ROC curve for our algorithm is better than the area under the ROC curve, (Kermany 1127). The ROC curve is a way to measure the performance of the algorithm by graphing the true positive rate vs. the false positive rate. The higher to the left the algorithm line is the better the algorithm is. The algorithm in the paper achieved the following ROC curve:
Image(filename = "/Users/xuechenli/Downloads/lessons/7324/lab2/ROC_Curve.png" )
In all classification problems, it is important to consider which is worse: false positives or false negatives. In this case, we will define a false positive as when the algorithm predicts that a child has pneumonia even when he or she doesn’t. A false negative is when the classifier predicts that the child does not have pneumonia even when the child does. In this case, it is clear that we want to limit the amount of false negatives and instead have more false positives. If there is a false positive, all that will happen is that the child will go under more supervised care—if the child does not have pneumonia, this will probably be found with time. If there is a false negative, though, the child will potentially leave the hospital even though he or she has pneumonia. Clearly, we will try to have more false positives than false negatives in this case.
Kermany, D., Goldbaum, M., Cai, W., Valentim, C., Liang, H., Baxter, S., McKeown, A., Yang, G., Wu, X., Yan, F., Dong, J., Prasadha, M., Pei, J., Ting, M., Zhu, J., Li, C., Hewett, S., Dong, J., Ziyar, I., Shi, A., Zhang, R., Zheng, L., Hou, R., Shi, W., Fu, X., Duan, Y., Huu, V., Wen, C., Zhang, E., Zhang, C., Li, O., Wang, X., Singer, M., Sun, X., Xu, J., Tafreshi, A., Lewis, M., Xia, H. and Zhang, K. (2018). Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 172(5), pp.1122-1131.e9.
Kaggle Dataset: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
#!pip install opencv-python
# Step 1: Import Modules
import numpy as np
import pandas as pd
import os, sys
import cv2
from tqdm import tqdm
import skimage
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import sys
import time
import gc
import math
from os import listdir
from glob import glob
import cv2
import seaborn as sns
import keras
from keras.regularizers import l2
from keras.wrappers.scikit_learn import KerasClassifier
import keras.backend as K
from keras.callbacks import EarlyStopping,Callback,ModelCheckpoint
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
from keras.preprocessing.image import img_to_array, load_img, ImageDataGenerator
from keras.utils.np_utils import to_categorical
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
import plotly
plotly.offline.init_notebook_mode()
from sklearn import metrics
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, precision_score
from sklearn import metrics as mt
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
print('Python:', sys.version)
print('Pandas:', pd.__version__)
print('Numpy:', np.__version__)
print('keras:', keras.__version__)
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
w,h=80,120
folder1 = "/Users/yifan/Desktop/train/NORMAL/"
normal = [f for f in os.listdir(folder1) if os.path.isfile(os.path.join(folder1, f))]
folder2 = "/Users/yifan/Desktop/train/PNEUMONIA/"
pneumonia = [f for f in os.listdir(folder2) if os.path.isfile(os.path.join(folder2, f))]
folder = "/Users/yifan/Desktop/train/"
chest = normal + pneumonia
print("Working with {0} images".format(len(chest)))
#Code from https://github.com/deadskull7/Pneumonia-Diagnosis-using-XRays-96-percent-Recall/blob/master/Pneumonia%20Diagnosis%20using%20Lung's%20XRay%20.ipynb
def read_data(folder):
images = []
labels = [] #Ture status
for dirc in os.listdir(folder):
readin = folder + dirc
if not dirc.startswith('.'):
if dirc in ['NORMAL']:
for image_name in tqdm(os.listdir(readin)):
label = 0
labels.append(label)
elif dirc in ['PNEUMONIA']:
for image_name in tqdm(os.listdir(readin)):
if 'bacteria' in str(image_name):
label = 1
elif 'virus' in str(image_name):
label = 2
labels.append(label)
for image_name in tqdm(os.listdir(readin)):
img = cv2.imread(readin + '/' + image_name) #Read in images from folder
if img is not None:
img = skimage.transform.resize(img, (w,h,3)) #Resize each image into 160*240
img = np.asarray(img) #Turn each image into array
img = ((img/255.)-.5) * 2 #Standardization
images.append(img)
images = np.asarray(images)
labels = np.asarray(labels)
return images,labels
chest_images, chest_ture = read_data(folder)
print(chest_images.shape)
print(chest_ture.shape)
print(chest_ture)
chest_labels=[]
for label in chest_ture:
if label == 0:
chest_labels.append('normal')
elif label == 1:
chest_labels.append('bacteria')
else:
chest_labels.append('virus')
Luminance is by far more important in distinguishing visual features. An excellent suggestion to illustrate this property would be: take a given image and separate the luminance plane from the chrominance planes. We will use 0.3R+0.59G+0.11*B to convert all the images into gray sclae. Range of grayscale values should spread out between 0 and 255.
def gray_scale(data):
'''
input: a np.array of images of rgb format
output: a np.array of images of grayscale format
'''
n_images = data.shape[0]
n_rows = data.shape[1]
n_columns = data.shape[2]
grayscale = np.zeros((n_images, n_rows, n_columns, 1))
for idx in range(n_images):
grayscale[idx, :, :, 0] = np.add(0.3*data[idx,:,:,0], 0.59*data[idx,:,:,1],
0.2*data[idx,:,:,2])
return grayscale
chest_gray = gray_scale(chest_images)
print(chest_gray.shape)
def linearize(data):
'''
input:a np.array of images
output: a 2-D np.array(1-D image feature for each row)
'''
num_images = data.shape[0]
num_columns = int(np.prod(data.shape)/num_images)
linear = np.zeros((num_images, num_columns))
linear = np.reshape(data, (num_images, num_columns))
return linear
chest_gray_linear = linearize(chest_gray)
chest_rgb_linear = linearize(chest_images)
print(chest_gray_linear.shape)
We displayed some images from three groups.
%matplotlib inline
plt.style.use('ggplot')
# a helper plotting function
def plot_gallery(images, titles, h,w, n_row=3, n_col=6):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.7 * n_col, 2.3 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
for i in range(n_row * n_col):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[i], size=12)
plt.xticks(())
plt.yticks(())
plot_gallery(chest_gray, chest_labels,w,h)
From the images of normal chest and pneumonia, we can hardly tell the differences between them just by insight. It indicates that further analysis of images is essential for this case.
i=0
j=0
k=0
for label in chest_ture:
if label == 1:
i=i+1
elif label == 2:
j=j+1
else:
k=k+1
print('bacteria:',i,
'virus:',j,
'normal:',k)
Each column in dataset represents a specific pixel, each row represents one image,which is total 5216 images. A value in dataset represents the darkness or grayscale of the image. Our target is 3 classes outuput, normal, bacteria and virus.
Raschka, S. and Mirjalili, V. (n.d.). Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. 2nd ed. Packt Publishing.
from sklearn.decomposition import PCA
import seaborn as sns
from sklearn.decomposition import IncrementalPCA
from sklearn.decomposition import KernelPCA
from sklearn.decomposition import SparsePCA
def full_pca(data,n):
'''
input:data,n_components
output: full pca of data
'''
chest_pca = PCA(n_components=n)
return chest_pca.fit(chest_gray_linear)
chest_pca = full_pca(chest_gray_linear,5216)
def explain_variance(pca):
'''
input:pca
output: explained variance
'''
explained_var = pca.explained_variance_ratio_
return explained_var
def cumu_variance(pca):
'''
input:pca
output: cumulative variance
'''
cumu_var_exp = np.cumsum(pca.explained_variance_ratio_)
return cumu_var_exp
cumu_var = cumu_variance(chest_pca)
explained_var = explain_variance(chest_pca)
print(cumu_var)
print(explained_var)
#!pip install cufflinks plotly
#The plot_explained_variance function is adapted from Eric's 04. Dimension Reduction and Images
from plotly.graph_objs import Scatter, Marker, Layout, XAxis, YAxis, Bar, Line
import plotly
def plot_explained_variance(var1,var2):
plotly.offline.iplot({
"data": [Scatter(y=var1, name='Explained variance'),
Scatter(y=var2, name='cumulative explained variance')
],
"layout": Layout(xaxis=XAxis(title='Principal components'), yaxis=YAxis(title='Explained variance ratio'))
})
plot_explained_variance(explained_var,cumu_var)
Retaining 40 components will get an explained variance ratio of 0.8 and retaining 200 components will get an explained variance ratio of 0.9. Since 200 is at the knee of the graph, it is appropriate for us to using the first 200 components to represent the chest image.
chest_pca_first200 = full_pca(chest_gray_linear,200)
cumu_var = cumu_variance(chest_pca_first200)
explained_var = explain_variance(chest_pca_first200)
plot_explained_variance(explained_var,cumu_var)
eigen_chest = chest_pca_first200.components_.reshape(200,w,h)
eigen_chest.shape
#The plot_gallery function is from Eric's 04. Dimension Reduction and Images
import matplotlib.pyplot as plt# a helper plotting function
def plot_gallery(images, titles, h, w, n_row=4, n_col=6):
"""
input: image matrix
output: image gallery
"""
plt.figure(figsize=(1.7 * n_col, 2.3 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
for i in range(n_row * n_col):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[i], size=12)
plt.xticks(())
plt.yticks(())
eigenlabels = ['eigenimage ' + str(i) for i in range(eigen_chest.shape[0])]
plot_gallery(eigen_chest,eigenlabels,w,h)
The components represent our images well. The first component reflect the average of the 5216 images
#Reconstruct_image function is adapated from Eric's 04 Dimensional Reduction and Images
''' original from Eric
def reconstruct_image(trans_obj,org_features):
low_rep = trans_obj.transform(org_features)
rec_image = trans_obj.inverse_transform(low_rep)
return low_rep, rec_image
idx_to_reconstruct = 131
chest_gray_linear_idx = chest_gray_linear[idx_to_reconstruct]
low_dim, reconstructed_image = reconstruct_image(chest_pca_first200,chest_gray_linear_idx.reshape(1, -1))
'''
def reconstruct_image(trans_obj,pca_features,idx):
'''
input:pca_data,trans_obj,org_features,idx
output:tranformation of the specific picture
'''
low_dim = trans_obj.transform(pca_features[idx].reshape(1,-1))
rec_image = trans_obj.inverse_transform(low_dim)
return low_dim, rec_image
chest_gray_linear.shape
chest_pca_first200.components_.shape
#Make a comparison here
#Take the 100th image as an example
low_dim, rec_image = reconstruct_image(chest_pca_first200,chest_gray_linear,100)
plt.subplot(1,2,1)
plt.imshow(chest_gray_linear[100].reshape(160,240), cmap=plt.cm.gray)
plt.title('Original')
plt.grid(False)
plt.subplot(1,2,2)
plt.imshow(rec_image.reshape(160,240), cmap=plt.cm.gray)
plt.title('full component PCA')
plt.grid(False)
As a check on our code, we tried to calculate Radial Based PCA as a negative case to see whether it is the worst one among those PCA methods. From the image analysis, we expect it to be the worst. Thankfully, it only has an accuracy of 72%.
So comparing PCA and Kernal PCA with the first 200 components, it is clear that the PCA and polynomial PCA do the best with more than 95% accuracy, while the Radial Based PCA does the worst, with only 75% accuarcy.
Above all, we would use the full PCA with first 200 components to be the final dataset that is used for classification.
# Data to plot
labels = 'bacteria','virus','normal'
sizes = [2530,1345,1341]
colors = ['gold', 'yellowgreen', 'lightcoral']
# Plot
plt.pie(sizes,labels=labels, colors=colors,
autopct='%1.1f%%', shadow=True, startangle=140)
plt.show()
As you can see above, we have 1341 samples that are normal, 2530 samples that are pneumonia-bacteria and 1345 samples that are pneumonia-viral, the proportion are quite different for three classes. So we want to keep the same proportion of different classes in each fold to make sure that each fold is a good representative of the whole data. Another problem we have to pay attention to is our sample size. From the counts of three categories, the sample size for normal and pneumonia-viral is small. After we do a 80/20 split for train and test set, we only have 1341/5=268 samples of normal in the test set. It is not enough, and we can get almost any performance on this set only due to chance. In K Fold cross validation, the data is divided into k subsets.
Now the holdout method is repeated k times, such that each time, one of the k subsets is used as the test set/ validation set and the other k-1 subsets are put together to form a training set. The error estimation is averaged over all k trials to get total effectiveness of our model. As can be seen, every data point gets to be in a validation set exactly once, and gets to be in a training set k-1 times. This significantly reduces bias as we are using most of the data for fitting, and also significantly reduces variance as most of the data is also being used in validation set. As a general rule and empirical evidence, K = 5 or 10 is generally preferred as it is often reported that the optimal k is between 5 and 10 , because the statistical performance does not increase a lot for larger values of k. So for our problem, using a 5/10 fold cross validation method to do an 80/20 split is a better way. Since we also want the same proportion of different classes in each fold, Stratified 10-fold is a better choice.
Arlot, Sylvain, and Alain Celisse. “A Survey of Cross-Validation Procedures for Model Selection.” Statistics Surveys, The Author, under a Creative Commons Attribution License, https://projecteuclid.org/download/pdfview_1/euclid.ssu/1268143839.
Gupta, Prashant. “Cross-Validation in Machine Learning.” Medium, Towards Data Science, 5 June 2017, https://towardsdatascience.com/cross-validation-in-machine-learning-72924a69872f.
Shulga, Dima. “5 Reasons Why You Should Use Cross-Validation in Your Data Science Projects.” Medium, Towards Data Science, 27 Sept. 2018, https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79.
We use our method to evaluate the metric by 10-fold Stratified Cross Validation.
from sklearn.model_selection import StratifiedKFold,cross_val_score
from sklearn import metrics
chest_skf=StratifiedKFold(n_splits=10)
X_train = []
X_test = []
y_train = []
y_test = []
for train_index,test_index in chest_skf.split(chest_gray_linear,chest_ture):
print("Train Index:",train_index,",Test Index:",test_index)
X_train_temp, X_test_temp =chest_gray_linear[train_index],chest_gray_linear[test_index]
y_train_temp ,y_test_temp =chest_ture[train_index],chest_ture[test_index]
X_train.append(X_train_temp)
X_test.append(X_test_temp)
y_train.append(y_train_temp)
y_test.append(y_test_temp)
# scale the numeric, continuous variables
from sklearn.preprocessing import StandardScaler
for i in range(10):
ss = StandardScaler()
X_train[i] = ss.fit_transform(X_train[i])
X_test[i] = ss.transform(X_test[i])
print(X_train[0])
train_pca_200=[]
test_pca_200=[]
for i in range(10):
train_temp = full_pca(X_train[i],200)
eigen_temp = train_temp.components_
eigen_temp = np.transpose(eigen_temp)
train_temp = X_train[i] @ eigen_temp
test_temp = X_test[i] @ eigen_temp
train_pca_200.append(train_temp)
test_pca_200.append(test_temp)
for i in range(10):
print(i+1,train_pca_200[i].shape,test_pca_200[i].shape)
CLASSES = 3
y_train_ohe=[]
y_test_ohe=[]
for i in range(10):
y_train[i]=np.array(y_train[i])
y_test[i]=np.array(y_test[i])
y_train_temp = keras.utils.to_categorical(y_train[i], CLASSES)
y_test_temp = keras.utils.to_categorical(y_test[i], CLASSES)
y_train_ohe.append(y_train_temp)
y_test_ohe.append(y_test_temp)
As we mentioned in False Positive vs False Negative Trade-off, in all classification problems, it is important to consider which is worse: false positives or false negatives. In this case, we will define a false positive as when the algorithm predicts that a child has pneumonia even when he or she doesn’t. A false negative is when the classivier predicts that the child does not have pneumonia even when the child does. In this case, it is clear that we want to limit the amount of false negatives. We also want to keep children from unnessary treatment, which will happen in false positive situation.
Since higher recall ratio illustrates lower false negative and we also concer about lower false positive, we should use F1-score as our main metric. F1-score are defined for binary classes. There are two ways to combine it into multiple classes. micro is calculated for the individual TPs, TNs, FPs, FNs of the confusion matrix, which weights each instance equally. macro is calculated as the average scores of the confusion matrix, which weights each class equally to evaluate the overall performance. Since we have an imbalanced instance for each class, we perfer to use F1 weighted macro-average score.
Given the metric for $K^{th}$ classes $X_k$: $$F1_{micro} = \frac {2\times (TP_1 + ... + TP_k) } {2\times (TP1_1 + ... + TP_k) + FP_1 + ... + FP_k + FN_1 + ... + FN_k} $$
$$F1_{macro} = \frac {X_1 + ... + X_k} {k} $$Guglielmocamporese. “Macro F1-Score Keras.” Kaggle, Kaggle, 20 Oct. 2018, https://www.kaggle.com/guglielmocamporese/macro-f1-score-keras.
# modified by https://www.kaggle.com/guglielmocamporese/macro-f1-score-keras
import keras.backend as K
import tensorflow as tf
from tensorflow import math
def f1(y_true, y_pred):
y_pred = K.round(y_pred)
tp = K.sum(K.cast(y_true*y_pred, 'float'), axis=0)
# tn = K.sum(K.cast((1-y_true)*(1-y_pred), 'float'), axis=0)
fp = K.sum(K.cast((1-y_true)*y_pred, 'float'), axis=0)
fn = K.sum(K.cast(y_true*(1-y_pred), 'float'), axis=0)
p = tp / (tp + fp + K.epsilon())
r = tp / (tp + fn + K.epsilon())
f1 = 2*p*r / (p+r+K.epsilon())
f1 = tf.where(tf.math.is_nan(f1), tf.zeros_like(f1), f1)
return K.mean(f1)
Class notebooks: https://github.com/eclarson/MachineLearningNotebooks</li>
# make a keras MLP
for i in range(10):
print('fold: '+ str(i+1))
mlp = Sequential()
mlp.add( Dense(input_dim=train_pca_200[i].shape[1], units=100, activation='relu') )
mlp.add( Dense(units=50, activation='relu') )
mlp.add( Dense(units=50, activation='relu') )
mlp.add( Dense(CLASSES) )
mlp.add( Activation('softmax') )
mlp.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy',f1])
mlp_simple = mlp.fit(train_pca_200[i], y_train_ohe[i],
batch_size=32, epochs=10,
shuffle=True, verbose=1,validation_data=(test_pca_200[i],y_test_ohe[i]))
epochs = 10
w,h=10,20 #Reshape to 10*20
input_shape = (w, h, 1)
Feature Standardization is not needed here since we have normalized our data already. ZCA Whitening is not necessary as well because we are using PCA. We keep random rorations to train our model to better handle images. Random shifts might not make any differences in our case. Also, we keep random flips to improve performance since we believe our images can be fliped. Random shifts might not make any differences in our case.
Brownlee, Jason. “Image Augmentation for Deep Learning With Keras.” Machine Learning Mastery, 12 Sept. 2019, https://machinelearningmastery.com/image-augmentation-deep-learning-keras/.
from keras.models import Sequential
from keras.layers import Reshape
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from sklearn.metrics import f1_score
mlp = Sequential()
mlp.add( Dense(input_dim=train_pca_200[i].shape[1], units=100, activation='relu') )
mlp.add( Dense(units=50, activation='relu') )
mlp.add( Dense(units=50, activation='relu') )
mlp.add( Dense(CLASSES) )
mlp.add( Activation('softmax') )
mlp.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy',f1])
# make a CNN with conv layer and max pooling
cnn = Sequential()
cnn.add( Reshape((1,w,h), input_shape=(1,w*h)) )
cnn.add( Conv2D(filters=16, kernel_size= (2, 2), padding='same', input_shape=(1,w,h),
data_format="channels_first") )
cnn.add( Activation('relu') )
cnn.add( MaxPooling2D(pool_size=(2, 2), data_format="channels_first") )
# add one layer on flattened output
cnn.add( Flatten() )
cnn.add( Dense(CLASSES) )
cnn.add( Activation('softmax') )
cnn.summary()
cnn.compile(loss='mean_squared_error',
optimizer='rmsprop',
metrics=['accuracy',f1])
from sklearn import metrics as mt
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline
def compare_mlp_cnn(cnn, mlp, X_test, y_test):
plt.figure(figsize=(15,5))
if cnn is not None:
yhat_cnn = np.argmax(cnn.predict(np.expand_dims(X_test, axis=1)), axis=1)
f1_cnn = mt.f1_score(y_test,yhat_cnn,average='macro') #Using f1 score
plt.subplot(1,2,1)
cm = mt.confusion_matrix(y_test,yhat_cnn)
cm = cm/np.sum(cm,axis=1)[:,np.newaxis]
sns.heatmap(cm, annot=True, fmt='.2f')
plt.title('CNN: '+str(f1_cnn))
if mlp is not None:
yhat_mlp = np.argmax(mlp.predict(X_test), axis=1)
f1_mlp = mt.f1_score(y_test,yhat_mlp,average='macro') #Using f1 score
plt.subplot(1,2,2)
cm = mt.confusion_matrix(y_test,yhat_mlp)
cm = cm/np.sum(cm,axis=1)[:,np.newaxis]
sns.heatmap(cm,annot=True, fmt='.2f')
plt.title('MLP: '+str(f1_mlp))
def contingency_column(cnn, X_test, y_test):
yhat_cnn = np.argmax(cnn.predict(np.expand_dims(X_test, axis=1)), axis=1)
return [yhat_cnn[i] == y_test[i] for i in range(len(y_test))]
def contingency_column_mlp(cnn, X_test, y_test):
yhat_mlp = np.argmax(mlp.predict(X_test), axis=1)
return [yhat_mlp[i] == y_test[i] for i in range(len(y_test))]
# Let's train the model
cnn_history=[]
mlp_history=[]
history_first=[]
mlp_his = []
classifier_cnn_first = []
classifier_mlp_first = []
for i in range(10):
print('Fold:' + str(i+1))
#simple cnn
# we need to exapnd the dimensions here to give the
# "channels" dimension expected by Keras
history = cnn.fit(np.expand_dims(train_pca_200[i], axis=1), y_train_ohe[i],
batch_size=32, epochs=epochs,
shuffle=True, verbose=0,
validation_data=(np.expand_dims(test_pca_200[i], axis=1),y_test_ohe[i]))
cnn_history.append(cnn)
history_first.append(history)
#mlp
print('MLP')
mlp_model = mlp.fit(train_pca_200[i], y_train_ohe[i],
batch_size=32, epochs=10,
shuffle=True, verbose=1,
validation_data=(test_pca_200[i],y_test_ohe[i]))
mlp_history.append(mlp)
mlp_his.append(mlp_model)
classifier_cnn_first.append(contingency_column(cnn_history[i], test_pca_200[i],y_test[i]))
classifier_mlp_first.append(contingency_column_mlp(mlp_history[i], test_pca_200[i],y_test[i]))
compare_mlp_cnn(cnn_history[i],mlp_history[i],test_pca_200[i],y_test[i])
From above we can see that our cnn1 model does not perform well for class 3, pneumonia virus, which has a much lower F1 score compared to mlp model.
history_first[0].history['val_f1']
np.expand_dims(train_pca_200[0], axis=1).shape
%%time
# changes:
# 1. increased kernel size
cnn2 = Sequential()
cnn2.add( Reshape((1,w,h), input_shape=(1,200)) )
cnn2.add( Conv2D(filters=16, kernel_size= (3, 3),
padding='same', input_shape=(1,w,h),
data_format="channels_first") )
cnn2.add( Activation('relu') )
cnn2.add( MaxPooling2D(pool_size=(2, 2), data_format="channels_first") )
# add one layer on flattened output
cnn2.add( Flatten() )
cnn2.add( Dense(CLASSES, activation='softmax') )
# Let's train the model
cnn2.compile(loss='mean_squared_error',
optimizer='rmsprop',
metrics=['accuracy',f1])
# we need to exapnd the dimensions here to give the
# "channels" dimension expected by Keras
cnn2_history = []
history_second = []
classifier_2 = []
for i in range(10):
print("fold: " + str(i+1))
history = cnn2.fit(np.expand_dims(train_pca_200[i], axis=1), y_train_ohe[i],
batch_size=32, epochs=epochs,
shuffle=True, verbose=0,
validation_data=(np.expand_dims(test_pca_200[i], axis=1),y_test_ohe[i]))
cnn2_history.append(cnn2)
history_second.append(history)
classifier_2.append(contingency_column(cnn2_history[i], test_pca_200[i], y_test[i]))
compare_mlp_cnn(cnn2_history[i],mlp_history[i],test_pca_200[i],y_test[i])
From above we can see that our cnn2 model does not perform well for class 3, pneumonia virus, which has a much lower F1 score compared to mlp model.
history_second[0].history['val_f1']
%%time
# changes:
# 1. increased kernel size
# 2. add another conv/pool layer
# 3. increase filter_layer from 16 to 32
cnn3 = Sequential()
cnn3.add( Reshape((1,w,h), input_shape=(1,200)))
num_filt_layers = [32, 32]
for num_filters in num_filt_layers:
cnn3.add( Conv2D(filters=num_filters,
kernel_size=(3,3),
padding='same',data_format="channels_first") )
cnn3.add( Activation('relu'))
cnn3.add( MaxPooling2D(pool_size=(2, 2), data_format="channels_first") )
# add one layer on flattened output
cnn3.add( Flatten() )
cnn3.add( Dense(CLASSES) )
cnn3.add( Activation('softmax') )
# Let's train the model
cnn3.compile(loss='mean_squared_error',
optimizer='rmsprop',
metrics=['accuracy',f1])
# we need to exapnd the dimensions here to give the
# "channels" dimension expected by Keras
cnn3_history = []
history_third = []
classifier_3 = []
for i in range(10):
print("fold: " + str(i+1))
history = cnn3.fit(np.expand_dims(train_pca_200[i], axis=1), y_train_ohe[i],
batch_size=32, epochs=epochs,
shuffle=True, verbose=0,
validation_data=(np.expand_dims(test_pca_200[i], axis=1),y_test_ohe[i]))
cnn3_history.append(cnn3)
history_third.append(history)
classifier_3.append(contingency_column(cnn3_history[i], test_pca_200[i], y_test[i]))
compare_mlp_cnn(cnn3_history[i],mlp_history[i],test_pca_200[i],y_test[i])
From above we can see that our cnn3 model performs equally well for all classes.
history_third[0].history['val_f1']
%%time
# changes:
# 1. increased kernel size
# 2. add another conv/pool layer with increasing num filters from 32 to 48
# 3. add more layers once flattened
# 4. add regularization l2
cnn4 = Sequential()
cnn4.add( Reshape((1,w,h), input_shape=(1,200)) )
num_filt_layers = [48, 48]
for num_filters in num_filt_layers:
cnn4.add( Conv2D(filters=num_filters,
kernel_size=(3,3),
padding='same',data_format="channels_first"))
cnn4.add( Activation('relu'))
cnn4.add( MaxPooling2D(pool_size=(2, 2), data_format="channels_first"))
# add one layer on flattened output
cnn4.add( Flatten() )
cnn4.add( Dense(100, kernel_regularizer = l2(0.001)))
cnn4.add( Activation('relu') )
cnn4.add( Dense(CLASSES, kernel_regularizer = l2(0.001)))
cnn4.add( Activation('softmax') )
# Let's train the model
cnn4.compile(loss='mean_squared_error',
optimizer='rmsprop',
metrics=['accuracy',f1])
# we need to exapnd the dimensions here to give the
# "channels" dimension expected by Keras
cnn4_history = []
history_fourth = []
classifier_4 = []
for i in range(10):
history = cnn4.fit(np.expand_dims(train_pca_200[i], axis=1), y_train_ohe[i],
batch_size=32, epochs=epochs,
shuffle=True, verbose=0,
validation_data=(np.expand_dims(test_pca_200[i], axis=1),y_test_ohe[i]))
cnn4_history.append(cnn4)
history_fourth.append(history)
classifier_4.append(contingency_column(cnn4_history[i], test_pca_200[i], y_test[i]))
compare_mlp_cnn(cnn4_history[i],mlp_history[i],test_pca_200[i],y_test[i])
From above we can see that our cnn4 model performs well for all the classes.
history_fourth[0].history['val_f1']
datagen = ImageDataGenerator(featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
rotation_range=5, # used, Int. Degree range for random rotations.
width_shift_range=0.1, # used, Float (fraction of total width). Range for random horizontal shifts.
height_shift_range=0.1, # used, Float (fraction of total height). Range for random vertical shifts.
shear_range=0., # Float. Shear Intensity (Shear angle in counter-clockwise direction as radians)
zoom_range=0.,
channel_shift_range=0.,
fill_mode='nearest',
cval=0.,
horizontal_flip=True,
vertical_flip=False,
rescale=None)
train_pca_reshape=[]
test_pca_reshape=[]
datagen_history=[]
for i in range(10):
train_pca_temp = train_pca_200[i].reshape(len(train_pca_200[i]),w,h,1)
test_pca_temp = test_pca_200[i].reshape(len(test_pca_200[i]),w,h,1)
datagen.fit(train_pca_temp)
train_pca_reshape.append(train_pca_temp)
test_pca_reshape.append(test_pca_temp)
datagen_history.append(datagen)
for i in range(10):
train_pca_temp = train_pca_200[i].reshape(len(train_pca_200[i]),w,h,1)
test_pca_temp = test_pca_200[i].reshape(len(test_pca_200[i]),w,h,1)
datagen.fit(train_pca_temp)
train_pca_reshape.append(train_pca_temp)
test_pca_reshape.append(test_pca_temp)
datagen_history.append(datagen)
from sklearn import metrics as mt
from matplotlib import pyplot as plt
from skimage.io import imshow
import seaborn as sns
%matplotlib inline
def summarize_net(net, X_test, y_test, title_text=''):
plt.figure(figsize=(15,5))
yhat = np.argmax(net.predict(X_test), axis=1)
f1 = mt.f1_score(y_test,yhat,average = "macro")
cm = mt.confusion_matrix(y_test,yhat)
cm = cm/np.sum(cm,axis=1)[:,np.newaxis]
sns.heatmap(cm, annot=True, fmt='.2f')
plt.title(title_text+'{:.4f}'.format(f1))
%%time
cnn = Sequential()
# let's start with an AlexNet style convolutional phase
cnn.add(Conv2D(filters=32,
input_shape = (w,h,1),
kernel_size=(3,3),
padding='same',
activation='relu', data_format="channels_last")) # more compact syntax
# no max pool before next conv layer!!
cnn.add(Conv2D(filters=64,
kernel_size=(3,3),
padding='same',
activation='relu')) # more compact syntax
cnn.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
# add one layer on flattened output
cnn.add(Dropout(0.25)) # add some dropout for regularization after conv layers
cnn.add(Flatten())
cnn.add(Dense(128, activation='relu'))
cnn.add(Dropout(0.5)) # add some dropout for regularization, again!
cnn.add(Dense(CLASSES, activation='softmax'))
# Let's train the model
cnn.compile(loss='categorical_crossentropy', # 'categorical_crossentropy' 'mean_squared_error'
optimizer='rmsprop', # 'adadelta' 'rmsprop'
metrics=['accuracy',f1])
# the flow method yields batches of images indefinitely, with the given transformations
cnn_generator = []
history_cnn=[]
classifier_adv = []
for i in range(10):
print("fold: " + str(i+1))
history = cnn.fit_generator(datagen.flow(train_pca_reshape[i], y_train_ohe[i], batch_size=128),
steps_per_epoch=int(len(train_pca_reshape[i])/128), # how many generators to go through per epoch
epochs=epochs, verbose=1,validation_data=(test_pca_reshape[i],y_test_ohe[i]),
callbacks=[EarlyStopping(monitor='val_loss', patience=2)])
history_cnn.append(history)
#pred = np.round(np.argmax(cnn.predict(X_test[i]),axis=1))
#c = f1_score(np.round(np.argmax(y_test_ohe[i],axis =1)), pred)
cnn_generator.append(cnn)
classifier_adv.append(contingency_column_mlp(cnn_generator[i], test_pca_200[i], y_test[i]))
for i in range(10):
summarize_net(cnn_generator[i], test_pca_reshape[i], y_test[i], title_text='Using Expansion:')
From above we can see that our advanced cnn1 model performs bad for class 3, pneumonia virus, which has a F1 score around 0.3.
#Use the Validation Data
from keras.callbacks import EarlyStopping
from keras.regularizers import l2
l2_lambda = 0.0001
# Use Kaiming He to regularize ReLU layers: https://arxiv.org/pdf/1502.01852.pdf
# Use Glorot/Bengio for linear/sigmoid/softmax: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
cnn = Sequential()
cnn.add(Conv2D(filters=32,
input_shape = (w,h,1),
kernel_size=(3,3),
kernel_initializer='he_uniform',
kernel_regularizer=l2(l2_lambda),
padding='same',
activation='relu',
data_format="channels_last")) # more compact syntax
cnn.add(Conv2D(filters=32,
kernel_size=(3,3),
kernel_initializer='he_uniform',
kernel_regularizer=l2(l2_lambda),
padding='same',
activation='relu',data_format="channels_last"))
cnn.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
cnn.add(Conv2D(filters=64,
input_shape = (w,h,1),
kernel_size=(3,3),
kernel_initializer='he_uniform',
kernel_regularizer=l2(l2_lambda),
padding='same',
activation='relu',data_format="channels_last")) # more compact syntax
cnn.add(Conv2D(filters=64,
kernel_size=(3,3),
kernel_initializer='he_uniform',
kernel_regularizer=l2(l2_lambda),
padding='same',
activation='relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
cnn.add(Conv2D(filters=128,
input_shape = (w,h,1),
kernel_size=(3,3),
kernel_initializer='he_uniform',
kernel_regularizer=l2(l2_lambda),
padding='same',
activation='relu',data_format="channels_last")) # more compact syntax
cnn.add(Conv2D(filters=128,
kernel_size=(3,3),
kernel_initializer='he_uniform',
kernel_regularizer=l2(l2_lambda),
padding='same',
activation='relu',data_format="channels_last"))
# add one layer on flattened output
cnn.add(Flatten())
cnn.add(Dropout(0.25)) # add some dropout for regularization after conv layers
cnn.add(Dense(128,
activation='relu',
kernel_initializer='he_uniform',
kernel_regularizer=l2(l2_lambda)
))
cnn.add(Dropout(0.5)) # add some dropout for regularization, again!
cnn.add(Dense(CLASSES,
activation='softmax',
kernel_initializer='glorot_uniform',
kernel_regularizer=l2(l2_lambda)
))
# Let's train the model
cnn.compile(loss='categorical_crossentropy', # 'categorical_crossentropy' 'mean_squared_error'
optimizer='rmsprop', # 'adadelta' 'rmsprop'
metrics=['accuracy',f1])
# the flow method yields batches of images indefinitely, with the given transformations
cnn_generator_2 = []
history_cnn_2=[]
classifier_adv2 = []
for i in range(10):
print("fold: " + str(i+1))
history = cnn.fit_generator(datagen.flow(train_pca_reshape[i], y_train_ohe[i], batch_size=128),
steps_per_epoch=int(len(train_pca_reshape[i])/128), # how many generators to go through per epoch
epochs=epochs, verbose=1,validation_data=(test_pca_reshape[i],y_test_ohe[i]),
callbacks=[EarlyStopping(monitor='val_loss', patience=2)])
history_cnn_2.append(history)
#pred = np.round(np.argmax(cnn.predict(X_test[i]),axis=1))
#c = accuracy_score(np.round(np.argmax(y_test_ohe[i],axis =1)), pred)
cnn_generator_2.append(cnn)
classifier_adv2.append(contingency_column_mlp(cnn_generator_2[i], test_pca_200[i], y_test[i]))
for i in range(10):
summarize_net(cnn_generator_2[i], test_pca_reshape[i], y_test[i], title_text='Using Exp.+Reg.+Init.:')
From above we can see that our advanced cnn2 model performs bad for class 3, pneumonia virus, which has a F1 score as low as 0.2.
%matplotlib inline
for i in range(10):
legends=['MLP','Simple CNN1','Simple CNN2','Simple CNN3','Simple CNN4','CNN Advanced 1','CNN Advanced 2']
plt.figure(figsize=(20,60))
plt.subplot(10,2,1)
plt.plot(mlp_his[i].history['f1'])
plt.plot(history_first[i].history['f1'])
plt.plot(history_second[i].history['f1'])
plt.plot(history_third[i].history['f1'])
plt.plot(history_fourth[i].history['f1'])
plt.plot(history_cnn[i].history['f1'])
plt.plot(history_cnn_2[i].history['f1'])
plt.legend(legends)
plt.xlabel('epochs')
plt.ylabel('f1_score %')
plt.title('Training')
plt.subplot(10,2,2)
plt.plot(mlp_his[i].history['val_f1'])
plt.plot(history_first[i].history['val_f1'])
plt.plot(history_second[i].history['val_f1'])
plt.plot(history_third[i].history['val_f1'])
plt.plot(history_fourth[i].history['val_f1'])
plt.plot(history_cnn[i].history['val_f1'])
plt.plot(history_cnn_2[i].history['val_f1'])
plt.legend(legends)
plt.xlabel('epochs')
plt.ylabel('val_f1_score %')
plt.title('Validation')
MLP performs best compared to all the cnn models for both training and validation set. Our F1 score for validation set is much lower than the training set for all our models. But according to the validation F1 score using sklearn, we do not see the same magnitude of the decrease. We are unsure why this is happening because our model trains very well on the test set and then it performs poorly on the validation set. This means that our model must be overtraining somehow. We also l2 regularization and drop out to prevent overfitting problem but it does not make a difference.If we adapted the parameters some more, we may have found a way to prevent this overtraining.
Methods = ['MLP','Simple CNN1','Simple CNN2','Simple CNN3','Simple CNN4','CNN Advanced 1','CNN Advanced 2']
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format('fold','Method','f1','validate f1'))
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format('', '', '', '', ''))
for i in range(10):
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format(i+1,Methods[0],np.mean(mlp_his[i].history['f1']),np.mean(mlp_his[i].history['val_f1'])))
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format(i+1,Methods[1],np.mean(history_first[i].history['f1']),np.mean(history_first[i].history['val_f1'])))
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format(i+1,Methods[2],np.mean(history_second[i].history['f1']),np.mean(history_second[i].history['val_f1'])))
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format(i+1,Methods[3],np.mean(history_third[i].history['f1']),np.mean(history_third[i].history['val_f1'])))
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format(i+1,Methods[4],np.mean(history_fourth[i].history['f1']),np.mean(history_fourth[i].history['val_f1'])))
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format(i+1,Methods[5],np.mean(history_cnn[i].history['f1']),np.mean(history_cnn[i].history['val_f1'])))
print('| {:^8} | {:^15} | {:^21}| {:^21} '.format(i+1,Methods[6],np.mean(history_cnn_2[i].history['f1']),np.mean(history_cnn_2[i].history['val_f1'])))
In statistics, McNemar's test is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal frequencies are equal (that is, whether there is "marginal homogeneity")
The test is applied to a 2 × 2 contingency table, which tabulates the outcomes of two tests on a sample of n subjects, as follows.
</h2>
| Test 2 positive | Test 2 negative | Rowtotal | |
|---|---|---|---|
| Test 1 positive | a | b | a+b |
| Test 1 negative | c | d | c+d |
| Column total | a+c | b+d | n |
The null hypothesis of marginal homogeneity states that the two marginal probabilities for each outcome are the same, i.e. \begin{aligned}~p_{a}+p_{b}=p_{b} + p_{c}~and~p_{c} + p_{d}=p_{b} + p_{d}\end{aligned}
Thus the null and alternative hypotheses are[1]
\begin{aligned}H_{0}&:~p_{b}=p_{c}\\H_{1}&:~p_{b}\neq p_{c}\end{aligned}Here $p_{a}$, etc., denote the theoretical probability of occurrences in cells with the corresponding label.
The McNemar test statistic is: \begin{aligned}{\chi^2} = \frac{(b-c)^2}{b-c}\end{aligned}
Under the null hypothesis, with a sufficiently large number of discordants (cells b and c), $\chi^2$ has a chi-squared distribution with 1 degree of freedom. If the $\chi^2$ result is significant, this provides sufficient evidence to reject the null hypothesis, in favour of the alternative hypothesis that $p_{b} ≠ p_{c}$, which would mean that the marginal proportions are significantly different from each other.
“McNemar's Test.” Wikipedia, Wikimedia Foundation, 22 Nov. 2019, https://en.wikipedia.org/wiki/McNemar's_test.
def make_contingency_table(classifier_1, classifier_2):
'''
input: classifier 1 right or wrong, classifier 2 right or wrong
'''
table = [[0, 0],[0,0]]
for k_fold in range(len(classifier_1)):
for i in range(len(classifier_1)):
if classifier_1[k_fold][i]:
i = 0
else:
i = 1
if classifier_2[k_fold][i]:
j = 0
else:
j = 1
table[i][j] += 1
return table
# Code taken form https://machinelearningmastery.com/mcnemars-test-for-machine-learning/
# Example of calculating the mcnemar test
from statsmodels.stats.contingency_tables import mcnemar
# define contingency table
table = [[4, 2], [2,4]]
def print_mcnemar(table):
# calculate mcnemar test
result = mcnemar(table, exact=True)
# summarize the finding
print('statistic=%.3f, p-value=%.3f' % (result.statistic, result.pvalue))
# interpret the p-value
alpha = 0.05
if result.pvalue > alpha:
print('Same proportions of errors (fail to reject H0)')
else:
print('Different proportions of errors (reject H0)')
print_mcnemar(table)
# Example of calculating the mcnemar test
from statsmodels.stats.contingency_tables import mcnemar
# define contingency table
table = [[4, 2], [2,4]]
print_mcnemar(make_contingency_table(classifier_cnn_first, classifier_mlp_first))
print_mcnemar(make_contingency_table(classifier_cnn_first, classifier_2))
print_mcnemar(make_contingency_table(classifier_cnn_first, classifier_3))
print_mcnemar(make_contingency_table(classifier_cnn_first, classifier_4))
print_mcnemar(make_contingency_table(classifier_2, classifier_3))
print_mcnemar(make_contingency_table(classifier_2, classifier_4))
print_mcnemar(make_contingency_table(classifier_3, classifier_4))
Based on the results above, we find that all the simple cnn models have different proportions of errors,which means they are significantly different from each other, and we will find out the best one later.
print_mcnemar(make_contingency_table(classifier_adv, classifier_adv2))
Based on the results above, we find that the two advanced cnn models are the same, so we can use either of them.
Based on the results above, we find that all the simple cnn models have different proportions of errors, but the two advanced models have same proportions of errors. Now we check which cnn model is best below.
#Find the best model with highest validate_f1: (Using the average of validate f1)
avgcnn1_f1 = [] #Simple cnn1
avgcnn2_f1 = [] #Simple cnn2
avgcnn3_f1 = [] #Simple cnn3
avgcnn4_f1 = [] #Simple cnn4
avgcnn5_f1 = [] #Advanced cnn1
avgcnn6_f1 = [] #Advanced cnn2
for i in range(10):
avgcnn1_f1.append(np.mean(history_first[i].history['val_f1']))
avgcnn2_f1.append(np.mean(history_second[i].history['val_f1']))
avgcnn3_f1.append(np.mean(history_third[i].history['val_f1']))
avgcnn4_f1.append(np.mean(history_fourth[i].history['val_f1']))
avgcnn5_f1.append(np.mean(history_cnn[i].history['val_f1']))
avgcnn6_f1.append(np.mean(history_cnn_2[i].history['val_f1']))
first_cnn = np.mean(avgcnn1_f1)
second_cnn = np.mean(avgcnn2_f1)
third_cnn = np.mean(avgcnn3_f1)
fourth_cnn = np.mean(avgcnn4_f1)
adv_cnn1 = np.mean(avgcnn5_f1)
adv_cnn2 = np.mean(avgcnn6_f1)
f1_val = [first_cnn,second_cnn,third_cnn,fourth_cnn,adv_cnn1,adv_cnn2]
f1_val_name =['Simple CNN1','Simple CNN2','Simple CNN3','Simple CNN4','CNN Advanced 1','CNN Advanced 2']
max_f1 = first_cnn
for idx, score in enumerate(f1_val):
if score > max_f1:
max_f1 = score
method = f1_val_name[idx]
print(' Our best model is : ' + str(method), "\n",
"Validate f1 :" + str(max_f1))
Our best model is Simple CNN4 with the highest validate f1 score.
#Compare our best model Simple cnn4 with mlp
avgmlp_f1 = []
for i in range(10):
avgmlp_f1.append(np.mean(mlp_his[i].history['val_f1']))
plt.plot(avgmlp_f1)
plt.plot(avgcnn4_f1)
plt.legend(['MLP','Simple CNN4'])
plt.ylabel('val_f1_score %')
plt.xlabel('epochs')
plt.title('Comparison Validation Score')
Micro- and macro-averages will compute slightly different things, and thus their interpretation differs. A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric. In a multi-class classification setup, micro-average is preferable since there might be class imbalance (we have many more examples of one class than of other classes).
“Computing AUC and ROC Curve from Multi-Class Data in Scikit-Learn (Sklearn)?” Stack Overflow, 1 Jan. 1966, https://stackoverflow.com/questions/33547965/computing-auc-and-roc-curve-from-multi-class-data-in-scikit-learn-sklearn.
Contributor, Guest. “Understanding ROC Curves with Python.” Stack Abuse, Stack Abuse, 25 Feb. 2019, https://stackabuse.com/understanding-roc-curves-with-python/.
First, we just check each class' ROC performance.
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
saved = []
for element in iterable:
yield element
saved.append(element)
while saved:
for element in saved:
yield element
#Modified by https://stackoverflow.com/questions/33547965/computing-auc-and-roc-curve-from-multi-class-data-in-scikit-learn-sklearn
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from scipy import interp
for j in range(10):
print("Fold:" + str(j+1))
y_pred = cnn4.predict(np.expand_dims(test_pca_200[j], axis=1))
fpr = dict()
tpr = dict()
roc_auc = dict()
mean_roc = 0
for i in range(CLASSES):
fpr[i], tpr[i], _ = roc_curve(y_test_ohe[j][:, i], y_pred[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
mean_roc += roc_auc[i]
mean_roc = mean_roc/CLASSES
colors = cycle(['blue', 'red', 'green'])
for i, color in zip(range(CLASSES), colors):
plt.plot(fpr[i], tpr[i], color=color, lw=2,linestyle=':',
label='ROC curve of CNN4 class {0} (area = {1:0.4f})'
''.format(i, roc_auc[i]))
y_pred_mlp = mlp.predict(test_pca_200[j])
fpr_mlp = dict()
tpr_mlp = dict()
roc_auc_mlp = dict()
mean_roc_mlp = 0
for i in range(CLASSES):
fpr_mlp[i], tpr_mlp[i], _ = roc_curve(y_test_ohe[j][:, i], y_pred_mlp[:, i])
roc_auc_mlp[i] = auc(fpr_mlp[i], tpr_mlp[i])
mean_roc_mlp += roc_auc_mlp[i]
mean_roc_mlp = mean_roc_mlp/CLASSES
colors_mlp = cycle(['deeppink', 'chartreuse', '#FFDD44'])
for i, color in zip(range(CLASSES), colors_mlp):
plt.plot(fpr_mlp[i], tpr_mlp[i], color=color, lw=2, linestyle='-.',
label='ROC curve of MLP class {0} (area = {1:0.4f})'
''.format(i, roc_auc_mlp[i]))
plt.plot([0, 1], [0, 1], 'k--', lw=2)
plt.xlim([-0.05, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()
print("The average ROC for CNN is:" ,mean_roc)
print("The average ROC for MLP is:" ,mean_roc_mlp)
#Modified by https://stackoverflow.com/questions/33547965/computing-auc-and-roc-curve-from-multi-class-data-in-scikit-learn-sklearn
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from scipy import interp
for j in range(10):
print("Fold:" + str(j+1))
y_pred = cnn4.predict(np.expand_dims(test_pca_200[j], axis=1))
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(CLASSES):
fpr[i], tpr[i], _ = roc_curve(y_test_ohe[j][:, i], y_pred[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test_ohe[j].ravel(), y_pred.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
y_pred_mlp = mlp.predict(test_pca_200[j])
fpr_mlp = dict()
tpr_mlp = dict()
roc_auc_mlp = dict()
for i in range(CLASSES):
fpr_mlp[i], tpr_mlp[i], _ = roc_curve(y_test_ohe[j][:, i], y_pred_mlp[:, i])
roc_auc_mlp[i] = auc(fpr_mlp[i], tpr_mlp[i])
# Compute micro-average ROC curve and ROC area
fpr_mlp["micro"], tpr_mlp["micro"], _ = roc_curve(y_test_ohe[j].ravel(), y_pred_mlp.ravel())
roc_auc_mlp["micro"] = auc(fpr_mlp["micro"], tpr_mlp["micro"])
# Plot all ROC curves
plt.figure()
plt.plot(fpr["micro"], tpr["micro"],
label='micro-average ROC curve CNN4 (area = {0:0.4f})'
''.format(roc_auc["micro"]),
color='deeppink', linestyle=':', linewidth=2)
plt.plot(fpr_mlp["micro"], tpr_mlp["micro"],
label='micro-average ROC curve MLP (area = {0:0.4f})'
''.format(roc_auc_mlp["micro"]),
color='navy', linestyle=':', linewidth=2)
plt.plot([0, 1], [0, 1], 'k--', lw=2)
plt.xlim([-0.05, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()
Based on the ROC result, the MLP model performs the best,but our Simple CNN4 also performs well.